Goto

Collaborating Authors

 Midland County


Supplementary Material for DeWave: Discrete Encoding of EEG Waves for EEG to Text Translation

Neural Information Processing Systems

In this material, we will give more technical details as well as additional experiments to support the main paper. The overview of the proposed framework, DeWave, is illustrated in Figure 6. The dataset is split into training (80%), development (10%), and testing (10%) sets, comprising 10,874, 1,387, and 1,387 unique sentences, respectively, with no overlap. We release our implementation code through GitHub to contribute to this area. Section 3.3, where a 6-layer CNN encoder slides through the whole wave and gets the embedding The codex encoder shares the same structure with word-level features.





Russian drone crashes in Polish field as Warsaw protests airspace violation and plans formal complaint

FOX News

Lt. Gen. Keith Kellogg discusses the latest with the Ukraine and Russia war after a deadly Russian attack on'America Reports.' A Russian drone may have crashed in a field in Poland, a move the country's deputy prime minister called a "provocation," as the United States and European leaders continue to push Moscow to end its war in Ukraine. The drone hit a cornfield in the village of Osiny in the eastern Lublin province, about 62 miles from Poland's border with Ukraine, Reuters reported. Deputy Prime Minister Wladyslaw Kosiniak-Kamysz, who also serves as defense minister, said Wednesday's incident was similar to cases in which Russian drones flew into Lithuania and Romania, and could be linked to efforts to end the war in Ukraine, according to the outlet. Polish police secure the area of a cornfield where an unidentified flying object has crashed and exploded in the country's east in Osiny on Wednesday.


TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP

Cai, Yuliang, Thomason, Jesse, Rostami, Mohammad

arXiv.org Artificial Intelligence

Vision-language models (VLMs), such as CLIP, have demonstrated strong performance across a range of downstream tasks. However, CLIP is still limited in negation understanding: the ability to recognize the absence or exclusion of a concept. Existing methods address the problem by using a large language model (LLM) to generate large-scale data of image captions containing negation for further fine-tuning CLIP. However, these methods are both time- and compute-intensive, and their evaluations are typically restricted to image-text matching tasks. To expand the horizon, we (1) introduce a training-time negation data generation pipeline such that negation captions are generated during the training stage, which only increases 2.5% extra training time, and (2) we propose the first benchmark, Neg-TtoI, for evaluating text-to-image generation models on prompts containing negation, assessing model's ability to produce semantically accurate images. We show that our proposed method, TNG-CLIP, achieves SOTA performance on diverse negation benchmarks of image-to-text matching, text-to-image retrieval, and image generation.


WavePulse: Real-time Content Analytics of Radio Livestreams

Mittal, Govind, Gupta, Sarthak, Wagle, Shruti, Chopra, Chirag, DeMattee, Anthony J, Memon, Nasir, Ahamad, Mustaque, Hegde, Chinmay

arXiv.org Artificial Intelligence

Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.


From MLP to NeoMLP: Leveraging Self-Attention for Neural Fields

Kofinas, Miltiadis, Papa, Samuele, Gavves, Efstratios

arXiv.org Machine Learning

Neural fields (NeFs) have recently emerged as a state-of-the-art method for encoding spatio-temporal signals of various modalities. Despite the success of NeFs in reconstructing individual signals, their use as representations in downstream tasks, such as classification or segmentation, is hindered by the complexity of the parameter space and its underlying symmetries, in addition to the lack of powerful and scalable conditioning mechanisms. In this work, we draw inspiration from the principles of connectionism to design a new architecture based on MLPs, which we term NeoMLP. We start from an MLP, viewed as a graph, and transform it from a multi-partite graph to a complete graph of input, hidden, and output nodes, equipped with high-dimensional features. We perform message passing on this graph and employ weight-sharing via self-attention among all the nodes. NeoMLP has a built-in mechanism for conditioning through the hidden and output nodes, which function as a set of latent codes, and as such, NeoMLP can be used straightforwardly as a conditional neural field. We demonstrate the effectiveness of our method by fitting high-resolution signals, including multi-modal audio-visual data. Furthermore, we fit datasets of neural representations, by learning instance-specific sets of latent codes using a single backbone architecture, and then use them for downstream tasks, outperforming recent state-of-the-art methods. The source code is open-sourced at https://github.com/mkofinas/neomlp. The omnipresence of neural networks in the last decade has recently given rise to neural fields (NeFs) (cf. Consequently, the popularity of neural fields has spurred interest in neural representations, i.e. using NeFs as representations for a wide range of downstream tasks. Existing neural representations, however, suffer from notable drawbacks.


Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder

Wang, Jiaqi, Song, Zhenxi, Ma, Zhengyu, Qiu, Xipeng, Zhang, Min, Zhang, Zhiguo

arXiv.org Artificial Intelligence

Reconstructing natural language from non-invasive electroencephalography (EEG) holds great promise as a language decoding technology for brain-computer interfaces (BCIs). However, EEG-based language decoding is still in its nascent stages, facing several technical issues such as: 1) Absence of a hybrid strategy that can effectively integrate cross-modality (between EEG and text) self-learning with intra-modality self-reconstruction of EEG features or textual sequences; 2) Under-utilization of large language models (LLMs) to enhance EEG-based language decoding. To address above issues, we propose the Contrastive EEG-Text Masked Autoencoder (CET-MAE), a novel model that orchestrates compound self-supervised learning across and within EEG and text through a dedicated multi-stream encoder. Furthermore, we develop a framework called E2T-PTR (EEG-to-Text decoding using Pretrained Transferable Representations), which leverages pre-trained modules alongside the EEG stream from CET-MAE and further enables an LLM (specifically BART) to decode text from EEG sequences. Comprehensive experiments conducted on the popular text-evoked EEG database, ZuCo, demonstrate the superiority of E2T-PTR, which outperforms the state-of-the-art in ROUGE-1 F1 and BLEU-4 scores by 8.34% and 32.21%, respectively. These results indicate significant advancements in the field and underscores the proposed framework's potential to enable more powerful and widespread BCI applications.


Simplicits: Mesh-Free, Geometry-Agnostic, Elastic Simulation

Modi, Vismay, Sharp, Nicholas, Perel, Or, Sueda, Shinjiro, Levin, David I. W.

arXiv.org Artificial Intelligence

The proliferation of 3D representations, from explicit meshes to implicit neural fields and more, motivates the need for simulators agnostic to representation. We present a data-, mesh-, and grid-free solution for elastic simulation for any object in any geometric representation undergoing large, nonlinear deformations. We note that every standard geometric representation can be reduced to an occupancy function queried at any point in space, and we define a simulator atop this common interface. For each object, we fit a small implicit neural network encoding spatially varying weights that act as a reduced deformation basis. These weights are trained to learn physically significant motions in the object via random perturbations. Our loss ensures we find a weight-space basis that best minimizes deformation energy by stochastically evaluating elastic energies through Monte Carlo sampling of the deformation volume. At runtime, we simulate in the reduced basis and sample the deformations back to the original domain. Our experiments demonstrate the versatility, accuracy, and speed of this approach on data including signed distance functions, point clouds, neural primitives, tomography scans, radiance fields, Gaussian splats, surface meshes, and volume meshes, as well as showing a variety of material energies, contact models, and time integration schemes.